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1. Introduction 

In the last decade, networks have arisen in numerous domains such as social 
sciences and biology. They provide an attractive graphical representation of 
complex data. However, the increasing size of networks and their great number 
of connections have made it difficult to interpret network representations of data 
in a satisfactory way. This has strengthened the need for statistical analysis of 
such networks, which could raise latent patterns in the data. 

Interpreting networks as realizations of random graphs, unsupervised classi- 
fication (clustering) of the vertices of the graph has received much attention. It 
is based on the idea that vertices with a similar connectivity can be gathered 
in the same class. The initial graph can be replaced by a simpler one without 
loosing too much information. This idea has been successfully applied to social 
(Nowicki and Snijders, 2001) and biological (Picard et ai, 2009) networks. It is 
out of the scope of the present work to review all of them. 

Mixture models are a convenient and classical tool to perform unsupervised 
classification in usual statistical settings. Mixture models for random graphs 
were first proposed by Holland et al. (1983) who defined the so-called stochastic 
block model (SBM), in reference to an older non stochastic block model widely 
used in social science. Assuming each vertex belongs to only one class, a latent 
variable (called the label) assigns every vertex to its corresponding class. SBM 
is a versatile means to infer underlying structures of the graph. Subsequently, 
several versions of SBM have been studied and it is necessary to formally dis- 
tinguish between them. Three binary distinctions can be made to this end: 

1. The graph can be directed or undirected. 

2. The graph can be binary or weighted. 

3. The model can (i) rely on latent random variables (the labels), or (m) 
assume the labels are unknown parameters: 

(i) SBM is a usual mixture model with random multinomial la- 
tent variables (Nowicki and Snijders, 2001; Daudin et ai, 2008; 
Ambroise and Matias, 2012). In this model, vertices are sampled in 
a population and the concern is on the population parameters, that is the 
frequency of each class and their connectivity parameters. 

(ii) An alternative conditional version of SBM (called CSBM) has been 
studied (Bickel and Chen, 2009). In CSBM, former latent random vari- 
ables (the labels) are considered as fixed parameters. The main concerns 
are then the estimation of between- and within-class connectivity pa- 
rameters as well as of the unknown label associated to every vertex (see 
Rohe et ai, 2011; Choi et ai, 2012). 

The present work deals with directed (and undirected) binary edges in random 
graphs drawn from SBM. 

The main interest of SBM is that it provides a more realistic and versatile 
model than the famous Erdds-Renyi graph while remaining easily interpretable. 
However unlike usual statistical settings where independence is assumed, one 
specificity of SBM is that vertices are not independent. Numerous approaches 
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have been developed to overcome this challenging problem, but most of them 
suffer some high computational cost. For instance Snijders and Nowicki (1997) 
study maximum-likelihood estimators of SBM with only two classes and binary 
undirected graphs, while Nowicki and Snijders (2001) perform Gibbs sampling 
for more than two classes at the price of a large computational cost. Other 
strategies also exist relying for instance on profile-likelihood optimization in 
CSBM (Bickel and Chen, 2009), on a spectral clustering algorithm in CSBM 
(Rohe et al., 2011), or on moment estimation in a particular instance of SBM 
called affiliation model (Ambroise and Matias, 2012, and also Example 1 in the 
present paper). 

A variational approach has been proposed by Daudin et al. (2008) to rem- 
edy this computational burden. It can be used with binary directed SBM and 
avoids the algorithmic complexity of the likelihood and Bayesian approaches 
(see Mixnet (2009) and also Mariadassou et al. (2010) for weighted undirected 
SBM analyzed with a variational approach). However even if its practical per- 
formance shows a great improvement, variational approach remains poorly un- 
derstood from a theoretical point of view. For instance, no consistency result 
does already exist for maximum likelihood or variational estimators of SBM 
parameters. Note however that consistency results for maximum likelihood es- 
timators in the CSBM have been derived recently by Choi et al. (2012) where 
the number of groups is allowed to grow with the number of vertices. Nonethe- 
less, empirical clues (Gazal et al., 2011) have already supported the consistency 
of variational estimators in SBM. Establishing such asymptotic properties is 
precisely the purpose of the present work. 

In this paper the identifiability of binary directed SBM is proved under very 
mild assumptions for the first time to our knowledge. Note that identifiability 
of directed SBM is really challenging since existing strategies such as that of 
Allman et al. (2009) cannot be extended easily. The asymptotics of maximum- 
likelihood and variational estimators is also addressed by use of concentration 
inequalities. In particular, variational estimators are shown to be asymptoti- 
cally equivalent to maximum-likelihood ones, and consistent for estimating the 
probability tt of an edge between two vertices. When estimating the group pro- 
portions a, an additional assumption on the convergence rate of tt is required 
to derive consistency. The present framework assumes the number Q of classes 
to be known and independent of the number of vertices. Some attempts exist to 
provide a data-driven choice of Q (see Daudin et al., 2008), but this question is 
out of the scope of the present work. 

The rest of the paper is organized as follows. The main notation and as- 
sumptions are introduced in Section 2 where identifiability of SBM is settled. 
Section 3 is devoted to the consistency of the maximum-likelihood estimators 
(MLE), and Section 4 to the asymptotic equivalence between variational and 
maximum-likelihood estimators. In particular, the consistency of variational es- 
timators (VE) is proved. Some concluding remarks are provided in Section 5 
with some further important questions. 
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2. Model definition and identifiability 

Let = {V,X) be the set of infinite random graphs where V = N denotes the 

set of countable vertices and X = {0,1} the corresponding set of adjacency 
matrices. The random adjacency matrix, denoted hy X ~ {Xi_j}^ ^^^^ is given 
by: for i ^ j, Xij = 1 if an edge exists from vertex i to vertex j and Xij = 
otherwise, and Xi^i = (no loop). Let P denote a probability measure on ft. 

2.1. Stochastic Block Model (SBM) 

Let us consider a random graph with n vertices {wi}i=i These vertices are 
assumed to be split into Q classes {Cq\^^^ g depending on their structural 
properties. 

Set a = (ai,...,aQ) with < cig < 1 and '^^OLq ~ 1. For every q, aq 
denotes the probability for a given vertex to belong to the class Cq. For any 
vertex Vi, its label Zi is generated as follows 

{Zi}i<,<n in] ai, . . . , aq) . 

where (n; ai, . . . , ag) denotes the multinomial distribution. Let Z[„] = 
(Zi, . . . , Zn) denote the random label vector of (ui, . . . , u„). 

The observation consists of an adjacency matrix X[„] — {Xij}-^^^^ j<n' ^'^^re 
= for every i and 

Xij \ Z^ = q, Zj = I B (TTqj) , Mi^ j ^ 

where BiiXqj) denotes the Bernoulli distribution with parameter < iXq^i < 1 
for every (g, I). 

The log-likelihood is given by 

/:2(X[„];a,^)=log^^e^i(^H;-H,-)p[Z[„]=z[„]]j , (1) 

where 

'Ci(X[„];z[„],7r) =^{Xjjlog7r^.,^^ +(l-X,,j)log(l-7r2.,^J} , (2) 

and P [ Z[„] = Z[„] ] = Pa [ Z[n] = ] = n"=i "z, ■ In the following, let = 
(a, tt) denote the parameter and 9* = (a*, tt*) be the true parameter value. No- 
tice that the Xijs are not independent. However, conditioning on Zi = q, Zj = I 
yields independence. 

Recall that the number Q of classes is assumed to be known and the purpose 
of the present work is to efficiently estimate the parameters of SBM. 
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2.2. Assumptions 

In the present section, several assumptions are discussed, which will be used all 
along the paper . 

Assumption 1. For every q ^ q' , there exists I G {1, . . . , Q} such that 

T^q.l ^ T^q',l Or TT;,, TT;,,/ . (Al) 

Following (Al), the matrix tt cannot have two columns equal and the cor- 
responding rows also equal. This constraint is consistent with the goal of SBM 
which is to define Q classes Ci, . . . , Cq with different structural properties. For 
instance, the connectivity properties of vertices in Cq must be different from 
that of vertices in C; with q ^ I. Therefore, settings where this assumption is 
violated correspond to ill-specified models with too many classes. 

Assumption 2. There exists C > such that 

y{qj)e {!,..., Q}\ ^,,/e]0,i[ ^ TT,,, e [C, 1 - C] • (A2) 

SBM can deal with null probabilities of connection between vertices. However, 
the use of logTr,^; implies a special treatment for tt^./ e {0, 1}. Note that all along 
the present paper, (A2) is always assumed to hold with not depending on n. 

Assumption 3. There exists < 7 < 1/Q such that 

Vge{i,...,Q}, [7,1-7] • (A3) 

This assumption implies that no class is drained. Actually the idcntifiability 
of SBM (Theorem 2.1) requires every aq € (0, 1) for q G {1, . . . , Q}, which is 
implied by (A3). In this paper, it is assumed that 7 docs not depend on n. 

Assumption 4. There exist < 7 < 1/Q and uq e N* such that any realization 
of SBM (Section 2.1) with label vector Zj*^j = (z*, . . . , z*) satisfies 

Vqe {l,...,Q},Vn>no, " > 7 , (A4) 

n 

where Nqiz^^^^) = \{l<i<n\z* = q}\. 

Note that (A4) is the empirical version of (A3). By definition of SBM, z*^j is 
the realization of a multinomial random variable with parameters (ai, . . . , aq). 
Any multinomial random variable will satisfy the requirement of (A4) with high 
probability. This assumption will be used in particular in Theorem 3.1. 

2. 3. Identifiability 

The identifiability of the parameters in SBM have been first obtained by 
AUman et al. (2009) for undirected graphs (tt is symmetric): if Q = 2, ?i > 16, 
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and the cocfRcicnts of tt arc all different, the parameters arc identifiable up to 
label switching. AUman et al. (2011) also established that for Q > 2, if n is even 
and 71 > (Q — 1 + i2±2L)2 (^^[^^ ^ similar condition if n is odd), the parameters 
of SBM are generically identifiable, that is identifiable except on a set with null 
Lebesgue measure. 

First generic identifiability (up to label switching) of the SBM parameters is 
proved for directed (or undirected) graphs as long as n > 2Q. 

Theorem 2.1. Let n > 2Q and assume that for any l<q<Q,aq>0 and 
the coordinates of r ~ tx .a are distinct. Then, SBM parameters are identifiable. 

The assumption on vector Tr.a is not strongly restrictive since the set of vec- 
tors violating this assumption is of Lebesgue measure 0. Therefore, Theorem 2.1 
actually asserts the generic identifiability of SBM (see Allman et at, 2009). 
Moreover, Theorem 2.1 also holds with r' = 7r*.a (instead of r = Tr.a), and 
with vectors r" given by r'^ — X); ^g,'""',?"^' every 1 < q < Q. Let us also em- 
phasize that Assumption (Al) is implied by assuming either n.a or 7r*.a has dis- 
tinct coordinates (which leads to identifiability). Note that Bickel et al. (2011, 
Theorem 2, Section 3.1) also recently derived an identifiability result for "block 
models" in terms of "wheel motifs" . 

Let us further illustrate the assumption on Tr.a through two examples. 
The first one is a particular instance of SBM called Affiliation Model 
(Allman et al., 2011) restricted to the setting where Q = 2. 

Example 1 (Affiliation model). From a general point of view, affiliation model 
is used with Q populations of vertices and considers undirected graphs (tt sym- 
metric). The present example focuses on a particular instance where Q = 2. 
In this model, the matrix tt is only parametrized by two coefficients tti and ixi 
(TTl ^ -Ki), which respectively correspond to within-class and between-class con- 
nectivities between edges. With Q = 2, the matrix tt is given by 

( TTl 7r2 

TT = 

\ 7r2 TTl 

Then, requiring (Tra)-^ = TTiai -f- 7r2a2 is not equal to {1^0)2 ~ tt2(^i + '7ria2 
amounts to impose that ai ^ ai- Indeed since within- and between-class con- 
nectivities are the same for the two classes, distinguishing between them therefore 
requires a different proportion of edges in these classes (ai =/= 02). 

Note that Allman et al. (2011) have derived a result on identifiability for af- 
filiation models with equal group proportions. 

The second example describes a more general setting than Example 1 in 
which the assumption on the coordinates of r can be more deeply understood. 

Example 2 (Permutation-invariant matrices). For some matrices tt, there exist 
permutations a : {1, . . . , Q} — )■ {1, . . . , Q} such that tt remains unchanged if one 
permutes both its rows and columns according to a. More precisely, let it" denote 
the matrix defined by 
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for every 1 < q,l < Q- Then, tt"^ = tt. 

For a given matrix tt, let us define the set of permutations letting tt invariant 

by 

6^ = {a: {1,...,Q}^{1,...,Q}\7T^ =7t} . 

The matrix tt is said pcrmutation-invariaiit if 6'^ ^ {Id}, where Id denotes 
the identity permutation. For instance in the affiliation model (Example 1), n is 
permutation-invariant since is the whole set of permutations on {1, . . . , Q}. 

Let us first notice that "label- switching" translates into the following property. 
For any permutation a o/ {1, . . . , Q\, 

TT^a" = 7ra , (3) 

where = for every q. The main point is that label-switching arises 

whatever the choice of (a, n), and for every a. 

By contrast, only permutation-invariant matrices satisfy the more specific 
following equality. For any permutation-invariant matrix tt, let a'" e 6'^ denote 
one permutation whose support is of maximum cardinality. (Such a permutation 
is not necessarily unique, for instance with the affiliation model.) Then, 

{■Kay" = -na . (4) 

Equation (4) amounts to impose equalities of the coordinates of ira in the support 
of a'^ . Let us recall that the support of a'^ corresponds to rows and columns of tt 
that can be permuted without changing tt. Then, assuming all coordinates ofna 
distinct leads to impose that classes with the same connectivity properties have 
different respective proportions (uq) to be distinguished between one another. 

Proof of Theorem 2.1. First, let P[„] denote the probability distribution func- 
tion of the adjacency matrix of SBM. Let us show that there exists a unique 
(a, tt) corresponding to P[„] . 

Up to reordering, let ri < r2 < . . . < denote the coordinates of the vector 
r in the increasing order: is equal to the probability of an edge from a given 
vertex in the class Cq. 

Let R denote the Van der Monde matrix defined by i?i_g = r* , for < « < Q 
and 1 < q < Q. R \s invertible since the coordinates of r are all different. For 
i> 1, Ri q is the probability that i given vertices in Cq have an edge. 

Let us also define 

Ui= ^ aurl, i = 0, . . . ,2(5 - 1 . 

l<fc<Q 

For i>l,Ui denotes the probability that the first i coefHcients of the first row 
of are equal to 1. Note that n > 2Q is a necessary requirement on n since 
Xi^i — by assumption. Hence given P[„], uo = 1 and iti, . . . , U2q~i are known. 
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Furthermore, set M the {Q + 1) x Q matrix given by Mi,j ~ Ui-^-j for every 
< i < Q and < j < Q, and let Mj denote the square matrix obtained by 
removing the row i from M. The coefhcients of Mq are 

M^,j=Ui+j= ^ rlukrl , with 0<i,j<Q . 

l<k<Q 

Defining the diagonal matrix A = Diag(a), it comes that Mq = RAR* , where 
R and A are invertible, but unknown at this stage. With Dk = det(Mfe) and 
the polynomial B{x) = Efc=o(~l)''^^'^-Dfc x'^, it yields Dq = det(MQ) 7^ and 
the degree of B is equal to Q. 

Set Vi = {l,ri, . . . ,rfY and let us notice that B{ri) is the determinant of 
the square matrix produced when appending Vi as last column to M. The Q + 1 
columns of this matrix are linearly dependent, since they are all linear com- 
binations of the Q vectors Vi, V2, . . ., Vq. Hence B{ri) = and is a root 
of B for every 1 < i < Q- This proves that B ~ -C'c3n^i('^ ^ ^0- Then, one 
knows r = (ri, . . . ,rQ) (as the roots of B defined from M) and R. It results 
that A = R^^Mq (i?*) ^, which yields a unique (ai, . . . , aq). 

It only remains to determine tt. For < i,j < Q, let us introduce Ui,j the 
probability that the first row of X[„] begins with i + 1 occurrences of 1, and 
the second row of X ends up with j occurrences of 1 (i + 1 + j < n — 1 implies 
n>2Q). 

Then, C/^j- = ^ < < Q, and the Q x Q matrix 

U = R At: ARK The conclusion results from t: ^ A-^ R-^U{R*^y^ A-^ . 

□ 

The assumption of Theorem 2.1 on r (r' or r"), leading to generic identifia- 
hility, can be further relaxed in the particular case where n = A and Q = 2. 

Theorem 2.2. Set n = 4, Q — 2 and let us assume that aq > for every 
^ < q ^ Q , arid the coefficients of t: are not all equal. Then, SBM is identifiable. 

The proof of this result is deferred to Appendix A. 

Note that when Q = 2, (Al) implies the coefHcients of tt are not all equal. 

3. Maximum-likelihood estimation of SBM parameters 
3.1. Asymptotics 0/ P (Z[„] = • | 

In this section we study the a posteriori probability distribution function of 
Z[„], P (^[„] ~ ■ I X[n]); which is a random variable depending on Xi^^y 
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3.1.1. Equivalence classes between label sequences 

Let us consider a realization of the SBM random graph generated with the 
sequence of true labels Z ~ z* , where z* — {z*}^^^,. 

Since a given matrix tt can be permutation-invariant (see Example 2 Sec- 
tion 2.3), the mapping z i— )- {""zi.zj } ^ ^gj^. can be non injective. To remedy this 
problem, let us introduce an equivalence relation between two sequences of labels 
z and z' : 

z^z' ^ 3a^& \ z[^ cr(zi), Vi € N* . 

Then 2 ~' 2' is equivalent to [z]^ ~ I-^'Itt' where [z]^ denotes the equivalence 
class of z. As a consequence, any vectors Z[„] and z^^j in the same class have the 
same conditional likelihood (2): 

Prom now on, square-brackets in the equivalence class notation will be re- 
moved to simplify the reading as long as no confusion can be made. In such 
cases, z will be understood as the equivalence class of the label sequence. 

3.1.2. Main asymptotic result 

Let P* := P (• I Z = z*) = P** ^. denote the true conditional distribution given 
the (equivalence class of the) whole label sequence, the notation emphasizing 
that P* depends on {a*,Tr*). 

The following Theorem 3.1 provides the convergence rate of 
P(^Z[„] = I X[n]j = Fa*,7r- (Z[„] =2*„] I X[„]^ towards 1 with respect 
to P* , that is given Z — z*. It is an important result that will be repeatedly 
used along the paper. 

Theorem 3.1. Let us assume that assumptions (A1)-(A4) hold. For every 
t > 0, 

= 0(^6-™) , 

where k > is a constant depending on n* but not on z* , and the O [ne^™) is 
uniform with respect to z* . 

Furthermore, the same result holds with P* replaced by P under (A1)-(A3). 

The proof of Theorem 3.1 is deferred to Appendix B. 

A noticeable feature of this result is that the convergence rate does not depend 
on z* . This point turns out to be crucial when deriving consistency for the MLE 
and the variational estimator (respectively Section 3.2 and Section 4.2). Besides, 
the exponential bound of Theorem 3.1 allows the use of Borel-Cantelli's lemma 
to get the P— almost sure convergence. 
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Corollary 3.2. With the same notation as Theorem 3.1, 
Moreover, 

P = zf„] I X[„]) — > 1 , P - a.s. , 

anrf /or euery Z[„] ^ Zj*^j , 

P (Z[„] = Z[„] I X[„]) > , P - a.s. . 

As a consequence of previous Corollary 3.2, one can also understand the 
above phenomenon in terms of the conditional distribution of the equivalence 
class Z[„] given X[„] . 

Corollary 3.3. 

where 'D^Zu] \ ^[ni) denotes the distribution of Z\n] given X\n\, — - — > refers 
to the weak convergence in Mi (2), the set of probability measures on £ (Z) the 
set of equivalence classes on Z ~ {1, . . . , Q}^ and 5z' is the Dirac measure at 
the equivalence class z* . 

Proof of Corollary 3.3. For every n G N*, let us define Z„ = {1, . . . , Q}" and 
£ {Zn) the corresponding set of equivalence classes. Let us further introduce a 
metric space {£ {Z„) ,dn), where the distance c?„ is given by 

n 

Vz, z' e£ {Zn) , dn {z, z') = miu 2"'''1(„^^,„^) . 

k=l 

Similarly for Z = {1, . . . , Q}^ , {£ {Z) , d) denotes a metric space with 
Vz, z' e£{Z), d (z, z') = min 2"''1(„^„ ) . 

u^z, v^z' ^ — ^ 
k>l 

Then, £ (Z„) can be embedded into £ (Z), so that £ (Zn) is identified to a subset 
of f (Z). 

Let us introduce B the Borel cr— algebra on £{Z), and Bn the cr— algebra 
induced by B on f (Z„). Let also P" = P [ • | X[„] ] denote a probability measure 
on B, and E„ [ • ] is the expectation with respect to P". 
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Set h E Cb (Z) (continuous bounded functions on £ (Z)) such that \\h\\^ < M 
for M > 0. By continuity at point z* , for every e > 0, there exists 77 > such 
that 

d{z,z*) <rj => \h{z*)-h{z)\ <e . 

Then, 

|E„ [/i (Z[„]) ] - h{z*)\ <Y,\h (z[„]) - P" (Z[„] = ^[„,) 

<e + 2M P"(^[n]=^M) 

< e + op(l) P- a.s. , 

by use of Corollary 3.2, where B* = B{z*,r]) denotes the ball in S [Z) with 
radius ry with respect to d. In the last inequality, op(l) results from Corollary 3.2, 
which yields the result. 

□ 

3.2. MLE consistency 

The main focus of this section is to settle the consistency of the MLE of (a*, tt*). 
Let us start by recalling the SBM log-likelihood (1): 

£2(X[„];a,^) =log |^e^i(^H;-H,-)p[Z[„] =z[„]] 

where P[Z[„] = Z[„] ] = n"=iQ^zii ^^^^ (a,7r) are the SBM parameters. Note 
that £2(^[ti] ; Q^i Ti") is an involved expression to deal with. 

First, the Xijs are not independent, which strongly differs from usual sta- 
tistical settings. For this reason, no theoretical result has ever been derived for 
the MLE of SBM parameters. 

Second, another non standard feature of Ci is the number of random variables 
which is n{n— 1) (and not n as usual). More precisely, there are n{n— 1) edges 
XjjS but only n vertices. This unusual scaling difference implies a refined treat- 
ment of the normalizing constants n and n{n — 1), depending on the estimated 
parameter a and tt respectively. As a consequence, the MLE consistency proof 
has been split into two parts: {i) the consistency of the tt estimator is addressed 
by use of an approach based on M-estimators, (ii) a result similar to Theo- 
rem 3.1 is combined with a "deconditioning" argument to get the consistency 
of the a* estimator (Theorem 3.9) at the price of an additional assumption on 
the rate of convergence of the estimator 7? of tt* . 

The consistency of the MLE of tt strongly relies on a general theorem which 
is inspired from that for M-estimators (van der Vaart and Wellner, 1996). 
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Theorem 3.4. Let {Q,d) and {'^,d') denote metric spaces, and let Af„ : 8 x 
— > M 6e a random function and M : — > M a deterministic one such that for 
every e > 0, 



sup M (6*) < M (6*0) , 
d{e,eo)>e 



(5) 



sup |M„(0,V')-M(0)| :=||A/„-M||e^^ . (6) 

(0,-0)eex* ri^+oo 

Moreover, set {9, ■0) — Argmaxg ^M„ {6, ip). Then, 

p 



d e,er 



^0 . 

One important difference between Theorem 3.4 and its usual counterpart 
for M-estimators (van der Vaart and Wellner, 1996) is that M„ and M do not 
depend on the same number of arguments. Our consistency result for the MLE 
of TT strongly relies on this point. 

Proof of Theorem 3.4- For every > 0, there exists 5 > Q such that 



P 



< p 



< 



,)-3(5 



Since llAf,, - 



P)yv]> > 0, it comes that for large enough values of n, 

n— >+oo 



P 



> r] 



< P 



M„(0,V)<M„(0o,V'o)-5 +o(l) 



<o{l) 



□ 



The leading idea in what follows is to check the assumptions of Theorem 3.4. 

The main point of our approach consists in using P* = Pa*.TT* (Section 3.1.2) 
as a reference probability measure, that is working as if Z^^] = z*n] were known. 
In this setting, a key quantity is 

^l{X[n];Z[n],TT) ^ ^ {X,j logTT^.^,,^ + (1 - j) log(l - TT^.^^^J } , 

where (z[„],7r) arc interpreted as parameters. For any (z[„],7r), let us introduce 
(/)„ (z[„] , tt) := -^-l—^Ci (X[„] ; , tt) , 



where the expectation is computed with respect to P* = P^, ^, . Actually our 
strategy (using Theorem 3.4) only requires to prove 0„ and are uniformly 
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close to each other on a subset of parameters denoted by V (see also the proof 
of Theorem 3.6 for more details) and defined as follows 

7^= {(z[„],7r) I (Al), (A2), |$„ (z[„],^)| < +^} . (7) 

Showing this uniform convergence between (/)„ and <&„ over V is precisely the 
purpose of Proposition 3.5. Its proof, which is deferred to Appendix C, strongly 
relies on Talagrand's concentration inequality (Massart, 2007). 

Proposition 3.5. With the above notation, let us assume (Al) and (A2) hold 
true. Then, 

I I P 
sup U„(z[„],7r) - $„(z[„],7r) — >0 . 

Actually Proposition 3.5 is crucial to prove the following theorem which set- 
tles the desired properties for £2(A[„] ; a, tt), that is (5) (uniform convergence) 
and (6) (well-identifiability) . 

Theorem 3.6. Let us assume that (Al), (A2), and (A3) hold, and for every 
(a, tt), set M„(a, tt) = [n(n — 1) ] ^ C2{X[n]',0!,TT) , and 

M(7r) 

^ I ^^'^ { ["-Z.-Z'^^i'^q.ilog + (1 - 7r*;)l0g(l - TTq'M)] \ , 

l_ g,i q ,1 ) 

where (a*,7r*) denotes the true parameter of SBM, and A = 
{a = (a^J)l<,,,<Q I a?,?' > 0, Eg' "9,9' = l} C A^q(K). 

Then for any ry > 0, 

sup M(7r) < M(7r*) , 

F 

sup|M„(a,7r) -M(7r)| — - — >0 , 

Q,7r TH. + 00 

where d denotes a distance. 

The proof of Theorem 3.6 is given in Appendix D. Its uniform convergence 
part exploits the connection between 0„(z[„],7r) and £2(^[ri]; ct, tt) (Proposi- 
tion 3.5). 

Let us now deduce the Corollary 3.7, which asserts the consistency of the 
MLE of TT*. 

Corollary 3.7. Under the same assumptions as Theorem 3.6, let us define the 
MLE o/(a*,7r*) 

(a,7r) :== Argmax(„ ,,)/:2(A[„];a,7r) . 
Then for any distance d{-, •) on the set of parameters tt, 

d{9,TT*) ^ . 

n— >-+oc> 
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Proof of Corollary 3. 7. This is a straightforward consequence of Theorem 3.4 
and Theorem 3.6. □ 



A quick inspection of the proof of uniform convergence in Theorem 3.6 shows 
that the asymptotic behavior of the fog-hkehhood £2 does not depend on a. 
Roughly speaking, this results from the expression of £2 in which the number of 
terms involving tt is of order whereas only n terms involve a. This difference 
of scaling with respect to n between tt and a justifies to some extent a different 
approach for the MLE of a* . 

Our proof heavily relies on an analogous result to Theorem 3.1, where the 
true value (a*,7r*) of SBM parameters is replaced by an estimator (a, tt). In 
what follows, P (Z[„] = Z[„] | X[„]) = Pg^y (Z[„] Z[„] | X[„]) (Section 3.1.2 and 
Lemma E.2) denotes the same quantity as P = z^^i] I ^[n]) where (a*,7r*) 
has been replaced by (S, tt). Let us state this result in a general framework since 
it will be successively used in proofs of Theorems 3.9 and 4.4. 

Proposition 3.8. Let us assume that assumptions (A1)-(A4) hold, and 
that there exists an estimator tt such that \\t^ ~ t^*\\^ = op(w„), with Vn = 
("v/logn/n). Let also a denote any estimator of a* . Then for every e > 0, 



p* 



y-v P {Zw,\ = Z\n^ I Xu 



Z[„j5^z*^j P I Z\„] = Z*„^ I X\, 



> e 



for n large enough, where ki,K2> are constants independent of z* , and 
log 




Moreover, the same result holds replacing P* by P under (A1)-(A3). 
The proof of Proposition 3.8 is given in Appendix E. 

In the same way as in Theorem 3.1, one crucial point in Proposition 3.8 is the 
independence of the convergence rate with respect to z*^j . Note that the novelty 
of Proposition 3.8 compared to Theorem 3.1 lies in the convergence rate which 
depends on the behavior of tt. This is the reliable price for estimating rather 
than knowing tt*. 

We assume Vn = o (\/logn/n) , which arises from the proof as a necessary 
requirement for consistency. However, we do not know whether this is a necessary 
or only a sufficient condition. Furthermore there is still empirical evidence (see 
Gazal et at, 2011) that the rate of convergence of tt is of order 1/n, but this 
property is assumed and not proved in the present paper. 
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Let US now settle the consistency of the MLE of a* on the basis of previous 
Proposition 3.8. 

Theorem 3.9. Let {a,Tr) denote the MLE of{a*,TT*) and assume ||7r — 7r*||j^ = 
Op (-^/logn/n). With the same assumptions as Theorem 3.6, and the notation of 
Corollary 3.7, then 

d{a,a*) — ^ — > , 

where d denotes any distance between vectors in R*^. 

Note that the rate \/n would be reached in "classical" parametric models 
with independent random variables. 

Proof of Theorem 3.9. In the mixture model framework of SBM, Lemma E.2 
shows the MLE of a is given for any q by 



1 " - 



n 

i=l 



First, let us work with respect to P*, that is given = z^^y Setting Afq(z[„]) 
Tn=i 1(^,=9)' it comes 



ag - 7V,(z[*„])/n 



< 



1 " - 

i=l 



1 " 
i—1 

1 " 

^- E {P (^^ ^ < I ^m)) i(^r=.) + P (%i ^ 4«] I 



i=l 



Note the last inequality results from 

i ^ P (Z, ^ z* I X[„]) < ^ max^P (Z, ^ < | X[„]) 

i 

< p [ur^i(z, ^ z*) I ] = p (Z[„] ^ z;;, I X[„] 



Second, let us now use a "deconditioning argument" replacing P* by P. Let 
Nq = Afq(Z[„]) denote a binomial random variable S(n,a*) for every q. Then 
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for every e > 0, 



< P[|S, -7V,/7i| > e/2]+P[|iV,/n-a;| >e/2] 

< P[|a, -iV,/?7,| > e/2]+o(l) , 

by use of LLC Finally, a straightforward use of Proposition 3.8 leads to 



P[\Sq-Njn\>e/2] 
= E2,^, [P(|S,-7V,/n|>e/2|Z[„])] 



0(1) 



□ 



4. Variational estimators of SBM parameters 



In Section 3, consistency has been proved for the maximum likelihood estima- 
tors. However this result is essentially theoretical since in practice the MLE can 
only be computed for very small graphs (with less than 20 vertices). Neverthe- 
less, such results for the MLE are useful in at least two respects. First from a 
general point of view, they provide a new strategy to derive consistency of esti- 
mators obtained from likelihoods in non-i.i.d. settings. Second in the framework 
of the present paper, these results are exploited to settle the consistency of the 
variational estimators. 

The main interest of variational estimators in SBM is that unlike the MLE 
ones, they are useful in practice since they enable to deal with very huge graphs 
(several thousands of vertices). Indeed the log-likelihood £2 {^[n] ; i") involves 
a sum over Q" terms, which is intractable except for very small and unrealistic 
values of n: 

£2(X[„];a,^)=logJ Yl ^^'^^'-'^'-''^PziJ^ln])} , (8) 

with hij{zi,Zj) = XijlogTT^^^^. + (1 - Xij)log(l - tt^.^^J. To circum- 
vent this problem, alternatives are for instance Markov chain Monte Carlo 
(MCMC) algorithms (Andrieu and Atchade, 2007) and variational approxima- 
tion (Jordan et ai, 1999). However, MCMC algorithms suffer a high computa- 
tional cost, which makes them unattractive compared to variational approxi- 
mation. Actually the variational method can deal with thousands of vertices 
in a reasonable computation time thanks to its complexity in Oin^). For 
instance, Mixnet (2009) package (based on variational approximation) deals 
with up to several thousands of vertices, whereas STOCNET package (see 
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Boer et ai, 2006) (Gibbs sampling) only deals with a few hundreds of vertices. 
Note that other approaches based on profile-likelihood have been recently devel- 
oped and studied for instance by Bickel and Chen (2009). 

The purpose of the present section is to prove that the variational approxi- 
mation yields consistent estimators of the SBM parameters. The resulting esti- 
mators will be called variational estimators (VE). 

4-1. Variational approximation 

To the best of our knowledge, the first use of variational approximation for 
SBM has been made by Daudin et al. (2008). The variational method consists 
in approximating P^bA = P (Z[„] = • | X^^^'^ by a product of n multinomial 
distributions (see (9)). The computational virtue of this trick is to replace an 
intractable sum over Q" terms (see (8)) by a sum over only terms (Eq. (11)). 
Let us define 2?„ as a set of product multinomial distributions 

2'n = |l?r[„, =nA^(l,T,4,...,T,,Q)|r[„]G5„| , (9) 

where 

Sn = j-r^n] = (n, . . . ,r„) e ([0,1]*^)" I Vi, n = (ri,i,...,r,,Q),^rj^g = l| . 

For any £'t[„] G 'Dm the variational log-likelihood, J7(-; •, •, •) is defined by 

^(X[„];T[„],a,^) = £2 (^H;a,7r)- A' (/?,,„,, P^H) , (iQ) 

where K[., .) denotes the KuUback-Leibler divergence, and P'^i"! = 
P (Z[„] = • I X[„]). With this choice of 2?„, J{X^n] 

; r[„] , a, tt) has the following 
expression (see Daudin et al. (2008) and the proof of Lemma F.3): 

J{X[n];T[r,\,a, tt) = ^ ^ hj{q, l)Ti^gTj^i - ^ Ti^g (logTj,g - logO;,) , (11) 

where 6y (g, I) = Xij logTr^^/ -I- {1~ Xij)\og{l — tt^^;). The main improvement of 
Eq. (11) upon Eq. (8) is that J'(X[„]; T[„] , a, tt) can be fully computed for every 
(r[„], a, tt). The variational approximation i?X[„] to P'^i^'i is given by solving the 
minimization problem over I?„ : 

i?XH e Argmin^^epjf (i?„P^H) , 

as long as such a minimizer exists, which amounts to maximizing 
jr(X[„] ; r[„] , a, tt) as follows 

T'lri] = (tt, a) Argmax^j^j J'(X[„] ; T[„] , a, tt) . 
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The variational estimators (VE) of {a*,TT*) are 

{a, tt) = Argmax^ J^(X[„] ; T[„] , a, tt) . (12) 

Note that in practice, the variational algorithm maximizes J7'(X[„]; r, a, tt) al- 
ternatively with respect to r and (a,7r) (see Daudin et at, 2008). Furthermore 
since (Q!,7r) i— > j7'(X[„];r[„],a,7r) is not concave, this variational algorithm can 
lead to local optima in the same way as for likelihood optimization. 



In the sequel, the same notation as in Section 3 is used. In particular it is as- 
sumed that a realization of SBM is observed, which has been generated from the 
sequence of true labels Z ~ z*. In this setting, P* = Pa*.TT* (Section 3.1.2) de- 
notes the conditional distribution ¥ {■ \ Z — z*) given the whole label sequence. 
The first result provides some assurance about the reliability of the variational 
approximation to P^["i = F^'^tt* [^[n] ~ ' I ^[n] ] (Section 3.1.2). 

Proposition 4.1. For every n, let I?„ denote the set defined by (9), and 
P^M (•) be the distribution of Zui given Xui. Then, assuming (A1)-(A3) hold, 



K{Rx,„,,P''^"^) 



inf K(D,P 



X,, 



^ 



Note that this convergence result is given with respect to P* (and not to P). 
Stronger results can be obtained (see Section 4.1) thanks to fast convergence 
rates. Proposition 4.1 yields some confidence in the reliability of the variational 
approximation, which gets closer to P^^i^'i as n tends to -l-oo. However, it does 
not provide any warranty about the good behavior of variational estimators, 
which is precisely the goal of following Section 4.2. 

Proof of Proposition ^.1. 



By definition of the variational approximation. 



where 5z 



'[„] 



ni<.<„'5.- el?„. Then, 



if (i?x,„, , P^^f"! ) < if (^z*^, , P""'"' ) = - log 



smce 



if(<5.. ,P^H)= ^5 (^(„,) log 



P^H(z[„,) 



- log [ P (Z[„] = z*„] I X[, 



The conclusion results from Theorem 3.1, and Corollary 3.2 since 



P Zu 



zf. I Xu 



^ I P* - a.s. 



□ 
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4-2. Consistency of the variational estimators 

Since the variational log-likelihood ;•,•,•) (10) is defined from the log- 
likelihood >C2('; •) , the properties of J'(X[„];T[„],a,7r) are strongly connected 
to those of £2{X[„] ; a, tt). Therefore, the strategy followed in the present section 
is very similar to that of Section 3. In particular, the consistency of tt (VE of 
TT*, see (12)) is addressed first. The consistency of the VE of a* (5, see (12)) is 
handled in a second step and depends on the convergence rate of the estimator 

of TT. 

The first step consists in applying Theorem 3.4 to settle the tt consistency. 
Following results aim at justifying the use of Theorem 3.4 by checking its as- 
sumptions. 

Theorem 4.2 states that £2 and are asymptotically equivalent uniformly 
with respect to a and tt. 

Theorem 4.2. With the same notation as Theorem 3.6 and Section 4-1, let us 
define 

Jn (a, tt) := — —J (Xu] ; f[„i , a, tt) . 

71(71 — 1) 

Then, (A2) and (A3) lead to 

sup{|J„(a,7r)- Af„(a,7r)|} =0(1), F - a.s. , 

Q,7r 

where the supremum is computed over sets fulfilling (A2) and (A3). 

This statement is stronger than Proposition 4.1 in several respects. On the 
one hand, convergence applies almost surely with respect to P and not P*. On 
the other hand. Theorem 4.2 exhibits the convergence rate toward 0, which is 
not faster than 7i(?i — 1). 

Proof of Theorem ^.2. 

From the definitions of £1, £2, and J (respectively given by Eq. (2), Eq. (1), and 
Eq. (10)) and recalling Z[„] = %i](7r) ~ Argmax^j^^j£i(X[„]; tt). Lemma F.l 
leads to 

J {^{n\ ; ^[11] , a, tt) < £2 {^\n\ ; a, tt) < £1 (X[„] ; Z[„] , tt) . 

Then applying (A3) and Lemma F.2, there exists < 7 < 1 independent of 
(a, tt) such that 

|>7(-'^[n];T^H,a,7r) - £i(X[„];z[„],7r)| < 7ilog(l/7) . 

The conclusion results straightforwardly. □ 

The consistency of tt is provided by the following result, which is a simple 
consequence of Theorem 4.2, Proposition 3.5, and Theorem 3.4. 
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Corollary 4.3. With the notation of Theorem 4-2 and assuming (Al), (A2), 
and (A3) hold, let us define the VE of (a* ,7r*) 

(5,7?) = Argmax^ ,^J„(a,7r) . 

Then for any distance d(-, •) on the set of n parameters, 



The proof is completely similar to that of Corollary 3.7 and is therefore not 
reproduced here. 

Finally, the consistency for the VE of a* is derived from the same decondition- 
ing argument as that one used for the MLE (proof of Theorem 3.9). Consistency 
for a is stated by the following result where a convergence rate of l/n is as- 
sumed for TT. Note that at least some empirical evidence and heuristics exist (see 
Gazal et at, 2011) in favour of this rate. 

Theorem 4.4. Let us assume the VE tt converges at rate l/n to n* . With the 
same assumptions as Theorem 4. 2 and assuming (Al), (A2), and (A3) hold, 
then 

d(a,a*) — - — > , 

where d denotes any distance between vectors in R*^. 
The crux of the proof is the use of Proposition 3.8. 

Proof of Theorem 4.4- 

Let us show that given ~ z*^^-^ , 



d{n,n*) 



p 



>0 . 



-^g(^[n])/" 



p* 



> . 



■oo 



For every q, 




where Ti^g = Ti^q (a,7r) (see (12)). Introducing z*, it comes that 




From (9), let us consider the a posteriori distribution of = (Zi, . . . 
denoted by 



Zn) 



n 
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Then, 



< 



< 



#9) 



1 " _ 1 " _ 

ag - 7V,(zf„])/7i = - XI (^'-^ " + n ^ ^^^^^ 

i=i 'i=i 

1 " _ 1 " _ 

■i=l i=l 
i=l 

using that when z* ^ q, < J2q^z* = ^ ^ ^h^* ■ Hence, 

1 " ~ 



aq - N, 



It remains to show D:f, , (z,* ,) > 1 at a rate which does not depend of 

z*^Y Let P = ¥a,w {^[n] = ' I -^[n]) dcnotc the a posteriori distribution of Z[„] 
with parameters (5,7?) (Section 3.1.2). Since Lemma F.4 provides 



< W--log 



the conclusion results from another use of Proposition 3.8 applied with n — tt 
and Vn = l/n. 

□ 



5. Conclusion 

This paper provides theoretical (asymptotic) results about the stochastic block 
model (SBM) inference. Identifiability of SBM parameters has been proved for 
directed (and undirected) graphs. This is typically the setting of real applications 
such as biological networks. 

In particular, asymptotic equivalence between maximum-likelihood and vari- 
ational estimators is proved, as well as the consistency of resulting estimators 
(up to an additional assumption for the group proportions). To the best of our 
knowledge, these are the first results of this type for variational estimators of the 
SBM parameters. Such theoretical properties are essential since they validate 
the empirical practice which uses variational approaches as a reliable means to 
deal with up to several thousands of vertices. 

Besides, this work can be seen as a preliminary step toward a deeper anal- 
ysis of maximum-likelihood and variational estimators of SBM parameters. In 
particular a further interesting question is the choice of the number Q of classes 
in the mixture model. Indeed it is crucial to develop a data-driven strategy to 
choose Q in order to make the variational approach fully applicable in practice 
and validate the empirical practice. 



imsart-ejs ver. 2010/04/27 file: SBM_Var_MLE_EJS.tex date: January 21, 2013 



Celisse, Daudin, Pierre/MLE and variational estimators in SBM 



21 



Appendix A: Proof of Theorem 2.2 

Proof of Theorem 2.2. 

Let us just assume Q = 2, n = A, and that no element of a is zero. 

If the coordinates of r = 7ra are distinct, then Theorem 2.1 applies and the 
desired result follows. 

Otherwise, the two coordinates are r, r' and r" arc not distinct. Set ri = 
r2 = a and = air\ + a2r\, for i > 0. Let us also define b = r'^ = and 
c = r'{ = r2 . Then, the following equalities hold: 

a ~ TTiiai + TTi2a2 = 7^21^1 + T^220L2 , 

b — TTiiai + 1^2X0^2 = '^\20L\ + 7r22a2 , 
2 2 

c = TTj^ai + 7r2i7ri2Q!2 = 7ri27r2iai + 1^22^2 ■ 

From a — b = {(K\2~ 1^21)012 = ^('"'12 ~ '"■21)0! 1 we deduce 7ri2 = 7r2i and a = b. 
Then, 

aia2(7rii - 7ri2)^ = (ai + a2)(ai7rn + 027112) - (aiTTn + a27ri2)^ 

= aia2(7r22 - ^12)'^ . 

If c = a^, then tth = 7ri2 = 7r2i = 7r22 = a and a cannot be found. 

If c 7^ a^, then Ittu - ttuI = \i^22 - '^\2\ 7^ 0. But Q;i(7rii - 7712) = a - 7ri2 = 
6— 7ri2 = Q!2(7r22 — '"■12) leads to |ai| = \ot2\ and ai = Q!2 = 1/2. Hence tth = 7722- 
Then, tth and 7ri2 are the roots of the polynomial — 2ax + 2a^ — c. 

At this stage, we need to distinguish between tth and 7ri2. Let us introduce 
the probability d that X[„i fits the pattern 



Then, d= (tt^^ + Stti 1 tt ) / 4 and one can compute e = ^/d — = (tth — 7ri2)/2. 
This leads to tth 7r22 = a + e and 7ri2 = 7r2i = a — e, which yields the 
conclusion. □ 
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Appendix B: Proof of Theorem 3.1 



B. 1 . Preliminaries 



Assuming (Al) holds true, tt* can be permutation-invariant (see Section 3.1.1). 
For this reason, we will consider equivalence classes denoted by [ Z[„] ] = [ Z[„] ] 
for the label vector Z[„] . 

Let us define P'^["i(z[„]) = P [^[n] = Z[n] | -Y[„] ] for every label vector Z[„], 
and P^["i {[z[n\\) = = [z[n]] I ^[n]) for corresponding class [zy^^]. Since 

every Zj^^j e [z[„] ] satisfies P"^["i(zj^j) = P'^i"! (z[„]), it results that 

P'"'-'' {hn]]) = E ^""'"^ (^h) = I hn]] I (^m) : (13) 

where |[2:[ti]]| denotes the cardinality of [z[n]\. 



B.2. Upper bounding 



P[[Zi„i] = [zi„l]|X,„i] 7 _ 

^[-mI^NmI p[[Z[„,] = [.-„,]|X[„,] > * I ^ - ^ 



Using P* instead of P [ • | Z ^ z*] for simplicity, let us first notice 



E 



([>]]) 



E 



[-h1/[-h1 ^""'"'([^nI) [-hI^I^-hI ^^H^t^Hl ^""'"'^^N^ 

- 2^ p^H(z* ) 

V- ^^'"'(^m) 

by (13) applied to [z*,^t^]- Partitioning according to the number 
of differences between Z[„] and , , it comes 



E 



(14) 



Note that the number of vectors Z[„] such that Z[„] ^ [zj*^j] and 

is roughly upper bounded by (") (Q — 1)' , this upper bound being reached for 
instance when tt* is such that tt*; ^ ''^n' i' for every {q,l) ^ {q',l'). Then, a 
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straightforward union bound leads to 
P* 



2[n] ^ [Z[„jJ 



> 



t 



by use of (") < , which is tight enough for our purpose. 



B.3. Upper bounding 



> 



Let us first notice that for every vectors Z[n] and Zj*^j, 



log 









(^p^[. 







loa 



P^M(z*) 



(15) 



Note that for any vector such that tt*. G {0, 1} and tt*. ^. 7^ tt*. ^ 



log ( ^x, ) ~ —00 and P'^["i(z[„]) = 0. Then such vector Z[„] can be 

removed from the sum in Eq. (14). 

Second for any vectors zr„i and z', let us further define 



D 



where Zi and z^- respectively refer to the i— th (resp. j— th) coordinate of vector 
Z[„] (resp. z'^n])- Note that P'(z[„], Zj^^j) remains unchanged if Z[„] and Zj^j are 
replaced by any representatives of their respective classes. 

If Nr{z[n]) = P'(z[„], zi*^, ) denotes the number of terms in the sum of 



imsart-ejs ver. 2010/04/27 file: SBM_Var_MLE_EJS.tex date: January 21, 2013 



Celisse, Daudin, Pierre/MLE and variational estimators in SBM 



24 



Eq. (15), then 

>^H(z[„]) 



> 



log r,X.- ,^ . N > log 



p* 



lof 



1 /. t 



log 



?Z=z' 



?Z=z' 



> 



log 



Finally, Hocffding's inequality (Proposition B.l) applied with 
log 



(1 — C)^ C ^ (sec Lemma B.2 and (A2)), and L — 2{bij — aij) provides 



for any s > 0, 
p* ^ 



( P^H(z,„j) 



?Z — Z* 



l02 



P^H(,j;j) 



< 



> s 



exp 



-jV^(z[„])5'^ 
P2 



B.4- Conclusion 

One then apply this last inequality with a particular choice of s: 



1 



loa 



t 



7V,(z[„]) \^ "n'-+i(g-i)'- 

which leads to 

^ _ log t-{r + l) log(n) - r log(Q - 1) 

Nr{Z[n]) 



E 



lot 



Nr{z[n]) 



E 



Z=z* 



With Lemma B.3, it is not difficult to show that for large enough values of n, 



4 \Nriz[„]) 



logT^x — 



and that 



exp 



-Nr{zin])i 
P2 



< exp 



p^i"i(zr,) 

3iV,(z[„])(c*)2 



4L2 
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Using Proposition B.4, it results that 



and 



< 



cxp 



3(7c*)^ 
8L2 



El [(Q-iKf 



where u„ = cxp 



8L2 



Finally for every t > 0, log(l + x) < a:: for every a; > implies 



^ p-^H(rz* 1) 



> 1 1 z = z* 



< (1 + (Q - l)?/„)" - 1 



< e 



(Q-l)n«„ 



1 



^ 



since nun — ^ as n — ^ +oo. Further noticing that the upper bound does not 
depend on ZjJ^j , the same result holds with P* replaced by P. 

B.5. Hoeff ding's inequality and related lemmas 



Proposition B.l (Hocffding's inequality). Let {Yi independent ran- 

dom variables such that for every i ^ j, Yi j £ almost surely. Then, 

for any t > 0, 



< cxp 



Lemma B.2 (Values of Uij and hj). Assuming (A2) holds for it* with C > 0, 
it comes for every ^ <i i ^ j <i n, 



Xij log 



(i 



< 2 log 



1-C 
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Lemma B.3 (Bounding the conditional expectation). Let us assume (Al), 
(A2), (A3), and (A4) hold true. Then for every label vectors z^n] and such 

that Z[n] — z^^^ = r {1 < r < n), there exist positive constants c* — c{tt*) and 
C* = C{tt*) such that 



< c* < — -E^=^' 



lot 



P^H(,j;j) 



<C* , 



where Nr 



riz[n]) = [(ij) I j 7^ j, ^ 



Proof of Lemma B.3. First, 



E 



Z=z* 



log 



P^M(7* ,) 



1 - 



E- 



1 " 



/tT* \ /l-TT* ' 



tt!. los 



+(!-<-,.) log 



1 - TT* 



ElO; 



a!. 



Note that the first sum in the above expression is actually taken over (i, j) such 

that TT* 7^ TT*. . 

Second, let us introduce 
C* :=max{2fc(7r* ;,7r*, ,,)} and c* := min {fc (tt* tt*, /2} , 

where maximum and minimum are taken over |((g, /), (q',l')) | tt* ; 7^ tt*, j, |, 

and k{x,y) = x\og{x/y) + (l — x) log [ (1 — x)/(l — y) ] for every x,y £ (0, 1). 
Then for every (i,j) such that tt*. 7^ tt*. 



< c* < 



Third, (A3) implies that 
tion B.4 entail 



log -4^ 



< log^-^. Therefore (A4) and Proposi- 



Vl ^ < 



log < log 



^ 



'J n— 
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The conclusion follows for every n such that 

12, 1-7 c* _ 

--iog-;-<y <^ 



n 7^ 



7 



□ 



Proposition B.4. Let z^n] and denote two label vectors. If (Al) and (A4) 

hold true, then 



> — n 



where D 



(z[„],2'„]) = I j 7^ j, Ki,z, ^ ^*z[,z'^' 7 > is t/ie constant 



given 



by (A4), and 



Proof of Proposition B.4-. 

Since one assumes (Al) holds true, tt can be permutation-invariant (see Ex- 
ample 2). Then, let us define ir"' = {T^cr(q).cr{i))i<q,i<Q with a a permutation on 
{1, . . . , Q}. Note that for permutation-invariant matrix tt, there exists a permu- 
tation a ^ Id on {1, . . . , Q} such that tt*^ = tt. Then, the following equalities 
hold 

with cr(z[„]) = ((t(zi), cr(z2), . . . , cr(z„)). Furthermore, neither Z?(z[„], 

will change if the same permutation is applied to the coordinates of 

vectors Z[„] and Then, computing £)(z[„], can be made by reordering 
zin] and zj;^]. 

Assumption (A4) implies that the number of coordinates of Zj*^j that are 
equal to 1 is at least := \n^~\ , where \n^~\ denotes the first integer larger 
than wj. The same property holds for every I < q < Q. Let us use a permutation 
of the coordinates of zT , such that 



■^[ri] — (Ij 2, . . . , Q, 1, 2, . . . , Q, . . . , 1, 2, . . . , Q, 2q,j^^]^ 



and apply the same permutation to For each block k of Q coordinates 
(1, . . . , Q) of z*^j, let us introduce a mapping (Tfc(-) where k denotes the number 
of the block in zT , such that 

[n] 

Vfc, Q + l<i<{k + l)Q, ak{z*) = z^. 
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Then it comes 

Zln] = (cti(1),CTi(2), . . . ,cri(Q), 0-2(1), 0-2(2),. . . ,cr2(Q), • ■ • ,0-„^(l), . . . ,cr„^(Q), 

(16) 

Note that this reorganization of Zj*^j is not unique. For instance, it is pos- 
sible to exchange o-i(3) with 0-4(3). Each o-^ is a function from {1,...,Q} to 
{1, . . . , Q}, which is a permutation provided it is injective. Let us choose a re- 
organization of the coordinates of z* which minimizes the number of injective 

O-feS. 

Besides, 

^^\{i^d) \ i ^ « e lk,j e 4', 7r*._^. ^ TT*^^^^. I , 

k,k' 

where Ik denotes the A:— th block of coordinates of z^^y li k ^ k' , the 
requirement that i 7^ j is fulfilled. Otherwise for k ~ fc', it is nec- 
essary to require that z* ^ z* since every values in Ik are different. 

Let us denote by B{k,k') = {(?, I 7^ } and by B{k) = 

[(qJ) 7r*, ^ K,{q),a,(i)} ■ Then, it comes that 

^ (^N'^h) I ^ XI |{(*'-?') \ i¥'.h i e hj e Ik', Tr*-,^. ^ 

k,k' 
k^k' 



(yk{q),(yk{l) 



YB{k,k') + YB{k) 

k^k' k 



D 



amounts to assess the cardinality of 



Therefore, lower bounding 

B{k, fc') and B{k), for 1 < fc 7^ fc' < n^. 
Let us now distinguish between two cases: 

1. either for every fc, fc' € {1, . . . , n^}, B{k, fc') + B{k', fc) > and B{k) > 0. 

2. or there exist fc, fc' such that B{k, fc') + B(fc', fc) = or B{k) = 0. 
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First case: In this setting, let Lz[„] — Zr* 







= r. Then, 




= [ B{k, k') + B{k', k)]+Y^B{k) 




since > 717 and n > r. 

Second case: Let us assume that there exist k, k' such that B{k, k') + 
B{k', k) = 0. (The B{k)s will be lower bounded by 0.) 

Then for every such k,k' , and are permutations. Indeed such k,k' 
lead to -Kq^i = TTaf,(q),a^,(i) = T^ay {q),ak{i), cvcry q,l S {!,..., Q}. Assume 
furthermore that crk{q) = crk{q') for some q,q' G {!,..., Q}. Then for every 
/ e {1,...,(5}, 7r,j = '^rTfc(g),fTj^,(0 = T^^TfcigO.o-k'W = V,'" Hcuce, for every 
/ e {!,..., Q}, TTqj = TTq'^i, which implies q ~ q' using (Al). Therefore, (7^ 
is injective and thus a permutation of {1, . . . , Q}. The same property holds for 
ak' which is also a permutation of {1, . . . , Q}. 

Furthermore for any such fc, fc', ak = (Jk' = c and ti'^'' = tt, where a denotes a 
permutation of {1, . . . , Q}. Indeed if one assumes dk 7^ cTfc' , then there exists q G 
{1, . . . , Q} such that crfc(g) ^ o-j{q). If it holds, one can interchange coordinates 
of Z[„]: (Jkil) and ak>{q)- This results in new mappings ak and dfe' between z*^j 
and Z[„], which are no longer injective. Then, the number of injective mappings 
ak in the writing of Z[„] decreases by 2 and is no longer minimal as earlier 
assumed. This yields ak = ak' and thus n'^'' = tt. Note that the existence of 
such a unique permutation a results from the fact that for every i > Qn^, 
Zi = ak{z*). Indeed if this was not true, the same reasoning as before applies: 
An interchange between Zi and ak{z*) would decrease the number of injective 
(TfcS in (16), which contradicts our assumption. As consequences, it also comes 
that tt'^ — t: and that for every i > Qn^, Zi = a[z*). 

Let m denote the number of non- injective mappings ak- Note that for any 
non-injective mapping ak {1 < k < n^), there exists at least one difference 
between Z[„] and Zj*j^j in the corresponding block k. Then, the number r of 
differences satisfies 
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The conclusion results from 

k^k' k k^k' 

n-y{n^ ^ 1) ^ {i^-y — m) [n^ ~ m — 1] 



> 



2 

2mn-y — — m mn^ + m [ n-y — m — 1 ] 



Finally, let us notice that m < n^, and that — 1 > m ~ amounts to say 
that no injective mapping exists in (16). However with the same reasoning 
as before, it means that for every 1 < k,k' < n^, B{k, k') + B{k' , k) > 0, which 
contradicts the assumption. Then, rij — [m, + 1) > and 

L/{zr„l , Zr„i ) > > = > — — > . 

^ _ 2 - 2 2r ~ 2Q ~ 2 

by use of (17) and 7 < 1/Q (see Assumption (A4)). 

□ 
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Appendix C: Proof of Proposition 3.5 

Proof of Proposition 3. 5. Let us first recall that 

1 



(/)„ (z[„] , tt) := -^——^Ci (X[„] ; Z[„] , tt) , 

$„ (Z[„], tt) ;= E (^[n], tt) | 2:[„] = 2;[*„] 



Then, 



I </'n (^[n] , tt) - $„ (z[„] , tt) I = /J„ 



^ (X,,, - log [7r3,,,^/(l - 



where p„ = ^„ = -tt*.^^., and = log(i/(l - 1)), t e]0,l[. 

With gi j = g (tt^..,^), let us introduce 



Sn (g) 



where g = {3i.j}i<j-^j<„- Note that on the parameter set V defined by (7), 
9ij \ for every 1 < i j < n. 

The expected control will result from the use of Talagrand's inequality (The- 
orem C.l). For every zr„i and e > 0, let us introduce the set 



7'(z[„]) = {7r|(z[„],7r)e7'} 



and define the event 



rj„(e;z[„]) = < sup p„Sn (g) < (1 + e) + V Pn^^Xn + (1/e + 1/3) p„r a;„ 

where F and A are constants respectively defined in Lemmas C.2 and C.3, 
and {xn}„ is a sequence of positive real numbers to be chosen later. Then 
Theorem C.l implies for any Z[„] 



P* sup I (Z[„] , tt) - $„ (Z[„] , tt) I > 7/ 



< ^P* (1 + e) Vp^TA + VpnT^Xn + (1/e + 1/3) p„F x„ > 7? + E e"" 



z[„i 
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Since Z[„] belongs to a set of cardinality at most Q", choosing Xn ~ nlog(n) 
entails the first sum is equal to for large enough values of n, while the second 
sum converges to 0. 

Finally, a quick inspection of the proof shows this convergence is uniform 
with respect to z^^^, which provides the expected result. □ 



Theorem C.l (Talagrand). Let {Yij}^^^_^j^^_^ denote independent centered ran- 
dom variables, and define 

where Q C M" . Let us further assume that there exist 6 > and cr > such 
that \Yijgrj\ < b for every {i,j), and supg^gY,i^j^3.r{Yijgij) < cr^. Then, for 
every e > 0, and x > 0, 



sup Snig) > E 
a 



supS'„(5) 

9 



(1 + e) + y/2a^x + 5 (1/e + 1/3) x 



< e" 



Proof A proof can be found in Massart (2007) (p. 170, Eq. (5.50)). 



□ 



Lemma C.2. With the same notation as Theorem C.l, Assumption (A2) en- 
tails that there exists T{() > only depending on such that 

supmaxj^ij (7ij| < F, and sup max Var (^^ ^y) < — . 

■p V iy^j 4 



Proof. If (z[„],7r) e V, then 

(tt..,,^. e {0, 1} ^ n;,^,, = ^,,,,^) ^ {g,^, = 0) . 

Then for every (z[„],7r) € V, there exists F = T{() > (Assumption (A2)) such 
that 

for every (z[„],7r) € V. This also leads to 

yi^j, Var(C,,5,,)<rV4 . 

□ 
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Lemma C.3. With the same notation as Proposition 3.5, for every Z[„] such 
that (z[„],7r) € V , there exists a constant A = A(C) > (Assumption (A2)J 
such that 



E 



sup 



E 



9tj 



Z = 



< A[n(n- 1); 



-1/2 



Proof of Lemma C.3. Let E* [ • ] denote the expectation given Z = z* . Then, 



sup p„ 



E 



< E 



X,X' 



sup p„ 



where the X- ^s arc independent random variables with the same distribu- 
tion as the XijS. A symmetrization argument based on Rademacher variables 



leads to 



E* sup p„ 



< 2E* 



sup p„Ee 



where E(:[-] denotes the expectation with respect to e^j S. Then, Jensen's inequal- 
ity yields 



E* sup /9„ 



E 



<2l 



<2E* 



sup 



\ 



Var, 



sup Pn\/n{n- l)g, 



□ 
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Appendix D: Theorem 3.6 
D.l. Proof of Theorem 3.6 

D.1.1. Notation 

For any metric space and any real- valued function / : — ^ R, let us 

define IMIq by 

||/||e:= sup 1/(0)1 . 

Let also a* and tt* be the true values of a and tt in SBM (see Section 2.1), A 
be the set of stochastic matrices of size Q given by .4 = {A = {ak,i)i^j. \ 

dk.i > 0, Ya=i o-k.i = !}■ 

Furthermore, let us introduce the following quantities 

'/>n(7r, Z[„]) = ^^ ^l(^[n];^[»],7r), Z[„] (tt) = Argmax^^n (z[„] , tt) , 

$„(7r,Z[„]) = ^ V<>^. logTTz,,;,, + (1 - 7r*.^.)l0g(l - TT^,,;,.) , 

Z[n] (tt) = Argmax^$„(z[„] , tt) , 

M„(Q!,7r) = — ^— -/:2(A[„];a,7r) , 
n{n — 1) ^ ' 

M{tt,A) ^^alai^aq^q'ai^vl-nlilogTTqn' + (1 - 7r*;)log(l - iTq^v)] , 

q,l q'l' 

= Argmax^g_4M(7r, A), M(7r) = M(7r, A^) . 

Note that A^^ is not necessarily unique for every tt. However our analysis only 
requires unicity of A-j^- and A.,,* ~ Iq, which is proved in the following rea- 
soning. Furthermore M(7r*) = J2q i a*a'^M*q i, where H*^ ; = tt* ; log tt* , + (1 - 
<,;)log(l-<i)- 

D.l. 2. Proof 

First let us prove A-^* is unique and Ajr* — Iq. Let us assume A-^-* ^ Iq. By 
definition of At^, it results 

0<M(7r*,A,0-M(7r*,/Q) 

TT*, , 1 — TT*, , 

= Va^af Va,,,'(7r*)a,,/'(7r*)[7r* ;log-^ + (1 -7r* ,)log- 2J_] 

g,! q'l' 
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Therefore for every {q,q',lj'), aq,g'(7r*)a/,,.(7r*)X(7r*;,7r*,;,) = by (A3). 

Since J^i' S;, /'("'*) = 1 implies for every 1 < ^ < Q, there exists I < I' < Q 
such that aij'{n*) > 0, there exists / : {1,...,Q} {1,...,Q} such that 

=^/(?)J(0- Then 

• / is a permutation of {1, . . . , Q} is excluded since we arc working up to 
label switching, 

• otherwise there exist two indices qi and (72 (91 7^ 92) such that rows qi and 
q2 of TT* are equal and so do the corresponding columns, which is excluded 
by (Al), 

which proves the unicity and that A,^* = Iq. 

Second, let us prove that: V77 > 0, sup^(^ jr*)>?7 ^(^) < M(7r*). In the 
sequel, let (a<}j)j<^;<Q denote coefficients of Aj^. Without further indication, 
ttq^i depends on the matrix tt. Then, 

M(7r) -M(7r*) 

= ^(^*q(^i^aq^q'ai,i'[ng.i\og'^ + (1 -7r*,)log^ — ^] 

q,l q'V 

Since {tt | (i(7r, tt*) > 77, (Al), (A2)} is a compact set, there exists 7r° ^ tt* 
satisfying (Al) and (A2) such that 

sup M(7r) - M(7r*) = M(7r°) - M(7r*) < . 

d(7r,7r* ) >t; 

Otherwise for every {q,l), X^gT ^q,q'0-i,i'K{T^*i,Tr^,i,) = would entail by (A3) 
that for every {q,l,q',l'), ctq^qiai^i' K{'k* i,tt^,i,) = 0. It implies that there exists 
/ : {1, . . . , Q} {1, . . . , Q} such that tt* ; — tt^^^j j^jj. The same reasoning as 
for the unicity of A^^- leads to 

• if / is a permutation of {1, . . . , Q}: a contradiction arises since tt*^ 7^ tt* 
up to label switching, 

• otherwise there exist two indices qi and (j2 (91 I2) such that rows q\ and 
g2 of TT are equal and so do the corresponding columns, which is excluded 
by (Al). 

Third, let us prove that sup„ ^ |M„(a, tt) - M(7r)| — — > 0. Set 

|M„(a,^)-M(^)| < |Af„(a,^)-0„(7r,%])| (18) 
+ I0«(7r,%]) - $„(7r,Z[„])| (19) 
+ |$„(7r,Z[„])-M(^)| . (20) 

These three terms are successively controlled in the following. 
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Upper bound of (18): Lemma F.2 implies that P— a.s., 



sup I M„ (a, tt) - 0„ (tt, Z[„] ) | = sup ■ 



< 



log(l/7) 
n — 1 



n(n — 1) 

^0 , 



Upper bound of (19): Let us first introduce several quantities. Set 

A(7r,Zj^],Z[„]) = |0„(7r,Z[^) - $„(7r,Z[„])|, A+(7r, Z[„], 0[„]) = (/)„(7r,Z[„]) - 
*n(7r,Z[„]), and A-(7r, Z[„], Z[„]) = -A+(7r, Z[„], Z[„]). Then, it comes 



P* supA(7r,Z[„],Z[„]) > 77 



1. If A (7r,Z[„],Z[„]) > 0, 



< P* 
+ P* 



sup 
sup 



{a-(^,Z[„],Z[„])1^- }>rj 
|A+(7r,Z[„],Z[„])lA+(^,z[„,,if[„,)>o| > V 



I (tI", %] ) - (tt, Z[„] ) I $„ (tt, Z[„] ) - 0„ (tt, Z[„] ) 

< $„(7r,Z[„]) - (/)„(7r,Z[„]) 



since (/)„(7r, Z[„]) < (/)„(7r, Z[„]). Then, Proposition 3.5 leads to 
sup| A"(7r,Z[„],Z[„])lA-(^,?(„, ?r„,)>o[ < sup A{t:,z 

-^0 . 



2. Otherwise A+(7r, Z[„], Z[„]) > 0, 

I 0„ (tt, Z[„] ) - $„ (tt, Z[„] ) I = 0„ (tt, Z[„] ) - (tt, Z[„] ) . 

Distinguishing between settings where (zr„i,7r) £ T' or not, it results 



sup 



< P* 



+ P* 



|A+(7r,Z[„],Z[„])lA+(77,z[„,,?[„j)>o} ^ 

|A+(7r,Z[„],Z[„])lA + (7r,?[„j,?H)>o} > ''1 

{a+( 



sup 

TT, (z[„],ir)G7' 



sup 



^/(%],^) e^•• 
'/'T^(7r,Z[„]) > -00 and 



|0„(7r,Z[„]) - $„(7r, Z[„])| < (7!)„(7r, Z[„]) - $„(7r,Z[„]) 
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jD* 



sup ] A+(7r,Z[„],Z[„])lA+(7r,?,„,.?lnl)>0f > 



< P* [3-K, $„(7r,Z[„]) = -oo, A+(7r,Z[„],Z[„]) > 77] . 

Set a sequence {en}„gpj. such that £„ — > and ne„ — > +00 as 71 — +00. 
Then, 



P* 



sup I A+(7r,Z[„],Z[„])lA+(^.j, , ,)>o[ > ri 



= P*[37r, $„(7r,Z[„]) = -00, A+(7r,Z[„], Z[„]) > ?7, 7V(z[„], tt) < e„n(n - 1) ] 

(21) 

+ P* [37r, $„(7r,Z[„]) = -00, A+(7r,Z[„],2[„]) > 77, 7V(z[„],7r) > e„n(n - 1)] , 

(22) 

where iV(z[„],7r) = |(z, j) | z ^ j, tt?,,?^. G {0, 1} and tt?,^?^ ^ '^**,^*} ■ 
The first term (21) in the right-hand side is dcah with by Proposition D.l: 

P* [37r, $„(7r,Z[„]) = -00, A+(7r,Z[„],Z[„]) > 77, A^(z[„],7r) < e„?T.(7i - 1)] 



< P* 

< P* 



Btt, $„(7r,Z[„]) = -00, 2a„ + A(7r,?[„],Z[„]) > 77, A^(z[„], tt) < e„n(7i - 1) 

>0 , 



2a„ + sup A(7r,Z[„],Z[„]) > 77 

(Z(„],7r)e-P 



n— ^+00 



foUowing the proof of Proposition 3.5. 
The second term (22) is upper bounded by noticing that 

{0„(7r,%j]) > -00} n {$„(7r,Z[„]) = -00} 

(i,j)eMo ) {{i,j)eMi 

where Mo = |(i,j) \ j, t^z^.zj ~ and tt*. ^, > o| and Mi 
\i¥'j, T^?.,z, = 1 and tt*. ^. < l|. 
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Thus, 



P*[3tt, $„(7r,Z[„]) = -oo, A+(7r,Z[„],Z[„]) > 77, iV(2[„], tt) > e„n(7i - 1) _ 

enn{'n—l) 

(1 -e)'""^""^' ^0 



< p* 



where {yfe}i<fe<£ n(n-i) denote i.i.d. Bernoulli variables with parameter 



and a Ab = min(a, 6). 



Finally since no upper bound does depend on z^^^ , every convergence in prob- 
ability with respect to P* can be turned into a convergence with respect to P. 



Upper bound of (20): $„(7r, 2;[„]) can be expressed as: 

*n(7I",2:[„]) 
qlq'V 

where A^,,/(z[„]) = \{i\ z* = q, and Zi = q'}\. 



^ nfn - l) K.i^og Vi' + (1 - T^g.iJlogll - Vi')J , (23) 



Let Nqq,{Tr) = 7V,,-(z[„](7r)), A^^ = |{i | z* = a^,,(7r) 



and 



the stochastic matrix of aqq'^n). Coefficient aqq'ljr) yield the proportion of 
vertices from class q attributed to class q' by Z[n]- Note that (23) shows that 
$„(7r, Z[n]) only depends on Z[„] through the matrix A^^. Therefore, one uses the 
notation <&,i(7r, A(z[„])) in place of <&„(7i', Z[n])- 

Definitions of and imply that 'i>„(7r, A^r) > $„(7r, ^^) and M(7r) = 
M(7r, A^) > M(7r, A^). Therefore, 

1. $„(7r,A) >M(jr) 

^ ^$„(^,A,)-M(7r) <$„(7r,A,)-M(^,A,), 

2. $„(7r,A^) <M(7r) 

^ < M(7r) - $„(^, A^) < M(^, i^) - $„(^, i,). 

Then, 



$„ {tt,A^)-M{tt) < sup I $„ {tt,A)^ M{tt, A) \ 



Moreover for every A £ A, 

$„(7r, A) - M{n,A) = 



E 

qq'W 



aqqiaw [1^1,1 logTT,/;' + (1 - 7rJ_,)log(l - 7r,'i')] 



ri(n - 1) " 

Since any TTq/p e {0, 1} such that tt* ^ 7^ TTg/;/ is excluded, (A2) provides 
Kilog^g'i' + (1 - <i)log(l - T^q'v)\ < A(C) < +c» , 
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where A{Q > is independent of tt and q, and only depends on from As- 
sumption (A2). 

Then, since < aq^i < 1 for every {q,l), the strong law of large numbers 

applied to each TV* entails that sup^ { | (tt, Z[„i (tt)) - M(7r) 1 1 !► P - 

a.s. . 



D.2. Proof of Proposition D.l 



Proposition D.l (Existence of a copy of Z[„] in V). Let tt be defined as in Sec- 
tion 2.1 and satisfying (A2). Let us further assume that there exists a sequence 



{en}„gN. such that Cn and : 



-oo as n +00, and 



[(ij) I j 7^ T^?.Sj G {0, 1} and Trj,,^^. ^ tt*. | < e„77,(ri - 1), a.. 
Then, there exist zj^j G V and a real sequence {un}^, such that 



< (?!)„(7r,Z[„]) - (/)„(7r, zj^]) < a„, and <I>„(7r,Z[„]) - $„(7r, z^^j) 



< a„, a.s. 



where a„ — > as n +oo, and a„ does neither depend on z*^j nor on tt. 



Proof of Proposition D.l. First let us introduce 

L = [{qi,q2) e {1, . . . I iV,^,,^ > ri^e^j , 

where 

^«i,92 = |{1 < i < I = 91, < = 92}| . 
For every 1 < i < n, we define zf in the following way: 

1. zf = Zj, if (zi, z*) G L, 

2. zf = c(z*), otherwise, 

where 1 < c(z*) < Q is obtained by applying Lemma D.2 with (72 = z*. 
Then it results that (zf , z*) £ L for every 1 < z < n. 
Let us now introduce 

Af=[{q,q',l,l')e{l,...,Qf\^gJ^{OA} andTT*,;, ^^,,,} . 

Then for every couple {i,j), {zf , z* , Zj" , z*) ^ M since {zf,z*) e L and 
(zf , z*) e i thanks to Lemma D.3. As a consequence, it comes Zjf j e V since 

j) I i 7^ .7, T^f-.z^- e 0, 1 and tt^p.^p 7^ tt*. . | = . 

Finally, the conclusion results from (A2) by noticing that the number of 
changes between Z[„] and z^j is at most Q^nyJT^. 

□ 
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Lemma D.2. Set L — ^{qi,q2) £ {1, . . . , Q}^ | Nq^^q.^ > ny^j, where 

^quq-i = \{^ <-i <n\z, = qi, z* = gall • 
With the notation and assumptions of Proposition D.l, if (A4) holds true then 
Vl<g2<Q, 31<(7i<Q, (gi,(72)eL. 



Proof of Lemma D.2. Otherwise, there exists 52 such that for every 1 < qi < Q, 
{.qi,q2) ^ L. Then, 

Q 

\{l<i<n\ z* = q2}\ = J2 ^11^12 < ' 

91 = 1 



whieh eontradicts (A4) 



□ 



Lemma D.3. With the same notation and assumptions as Lemma D.2, let us 
introduce 

N=[{q, q\ /, /') e {1, . . . , Q}^ I Tlq^l e {0, 1} and TT* ^ ITqjj . 



Then, 



{q,q',l,l')eN- ^ {q,q') ^L or {1,1')^ L 



Proof of Lemma D.3. If [q, q') G L and (/, /') € i, then Nq,qrNiji > n'^En, which 
contradicts that 



Zi.Zj ^ 



e 0, 1 and 



T^z^.z, ^ K.X.Z] } < e„n(n - 1) 



□ 
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Appendix E: Proof of Proposition 3.8 
E. 1 . Proof of Proposition 3. 8 

Preliminaries First in the same line as the proof of Theorem 3.1, the main 
quantity to deal with is 



log ■ 

= ^|x,,log( 



(24) 



+ (l-X,j)log 



1 - TT , 



1 



where P"^["i (z[„]) = IPs.ff (^[n] = Z[n] \ X[n\) is the same quantity as 
P'^i"i(z[„]) = Ya',TV {Z[n] = Z[n] \ ^In]) where {a*,TT*) has been replaced by 



(SjTr), and Z[„] and z^^^ denote label vectors such that 
1 < r < n. 

Second, let us assume that 



Itt — 7r*||„ < min 



min 

{q,i), x;,^{oa} 



r. with 



(25) 



where C, is given by (A2), which is fulfilled on the event 

a. = {||^f-7r*||^ <!>„} (26) 

for large enough values of n since u„ = o [y/Togn/nj. Note that by assumption, 
P [51^] > 0. It is also important to notice that the definition of tt implies 

that every ifqj e {0, 1} U [C, 1 — C] (see (A2)), which leads on r2„ to 

y{q,l), <,e{0,l} ^ 5f,,i=<, . (27) 

Finally for a given vector Z[„], let us introduce the following sets of couples 

D* ^ D* (z[„] ) { (», j) h ^ J, tt;^,^ ^ } , (28) 

D = D{z[,,^) := I * ^ J, Tf,.,,^ =^7f . (29) 

Proof First, the log-ratio (24) can be decomposed into the following terms 



log- 



P-H(.* ) 



E]^^.log(f^).(l_X..)log^i3f^)[.Elo.| 

(l-X,,,)log 
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Second using the definition of D* and D given by Eq. (28) and Eq. (29), and 

tfiat D* Ci D C \ i ^ j, h{Xi,j;Tf,Tr*,Zi,Zj,z*,z*) = 0} wliere A denotes 

tlie complement of any set A and 



h{X,j;Tr,T:*,z„Zj,z*,z*) 



log 



+(l-X,,,)log 



1-9, 



1 — TT*. 



it results 



log 



(i,i)G-D* 



^^'"'(^n) 

P^H(zj;j) 

log 



1 - tt! 



E ■ 



Xij log 



+ (1-X,,j)l0g 



1 - TT* ■ 



Finally from the following equalities 



logTf^,,^^ = lognl.^^. + log 



and 



log(l - 5?;,^,^^ = log(l - tt;^ ) + log 



^Z.,Z, - 7^;,,,^^. 

1 - TT*. 



the last sum can be further split into 



^z.,z, ^'z^z; 
/z*„z, ^z*,z;^ 



(l-X,,,)log 



l-^z.,z, l-<r,zj 
l-T^z^.z, l-7fz*,z* 



(ij)6D*UD 

- E log 

(ij)eD*uD 



K.zA'^-K,zA 



(t^z'^z* - 7r*._^.)(X,j - Tr*.,^.) 
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This leads to 



log- 



V [n]' 



1 - tt! 



+ 5Z 

(•ij)Gn*u5 

- E log 

(ij')eD*u5 



1 -I 1 ! — J ! — i_ 



Note that (27) implies for every I < i ^ j < n, 

{^z,,z, -ttI^^ )iX^j -ttI^^ ) 



<,z, e {0,1}^ log 



1 + 



In the sequel, the strategy consists in providing successive upper bounds for Ti, 
T2, and T3. 

Upper bounding Ti 

The magnitude of Ti is given by a similar argument to that in the proof of 
Theorem 3.1. Let us consider 

'l-TT* 



1 - TT*. \ 1 



^ lO; 



TT*. 1 - tt! 



+ (1 log 



1 — TT* 



1-71-* 



D' y 

— Ti^i + Ti_2 • 
Then for every t € R, 

p*[Ti>t]=P*[Tia+Ti^2>i] . 

Upper hound ofTi2'- The same proof as that of Lemma B.3 shows there exists 
a constant K{it*) ~ K* > such that 

Ti,2i\D*\y^< max -fc -iC* < , 
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(1 



7r*j)log ^y— ps^^J and \D*\ denotes the cardinality oi D*. Thus, 

P*[Ti>t]< P*[Ti.i-\D*\K* >t] . 
Upper bound ofTi i: 



P* [Ti > t 
< P 



J2 

(t,j)eD* 



log 



TT*, . 1 — TT* 

z' ,Z* Zi,Zi 



>t+\D*\K* 



Hocffding's inequality associated with (A2) provides a constant Cq > Q such 
that for every i G M 



< exp 



E (^., log 

ii,j)eD' 

\D*\^ (K*)^ + 2t\D*\K* 



1 - TT* 



TT*. . 1 - TT* 



exp 



-2t- 



"c7 



■ exp 



\D* 



*\2 



Upper bounding T2 

With i > on the event {T2 > t}, log(l + x) < x for every x > — 1 leads to 

(7fz,,z,- - 7r*^_^^.)(Xij - Ki.Zj) 



0<t<T2< J2 



TT* (I — TT* ) 

"Zi,Zj\-^ "Zi,Zj/ 



{i,j)eDUD* 

Then with Npj, = E(,,j)e_DuD* l(z*=g',z;=/')l(z,=9,z,=i)' comes 



E 



E E 



E<r 

q'.l' 
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Introducing the event f2„ defined by (26) and using (A2), one gets for every 
t > 

p*[a.n{T2 >t}] 



q,l 



q' ,1' (i.j) e D u D* 
= (9',!') 



> t/{2Vn) 



P* 



c^E 



q'.l' 



> t/i2Vn) 



For tlie first term, Hoeffding's inequality requires summing over a determin- 
istic set of indices, wliich leads to 



c^E 



qd 



E E 



q',l' (i, j) e OU D' 
(zi.zj) = (q.l) 



E 

k D. \D\ = k 



c^E 

qj 



E E " "^9^'') 



q' ,V (i, 3) G -D U -D* 
(^*,^*) = (I'.i') 

= (5,0 



> t/{2v„) 



where the sum over k is computed for [7/2 nr] < k < 2nr by Proposition B.4 
and Lemma E.l. 

For each set D such that \D\ ~ k, a. union bound and Hoeffding's inequality 
provide 



P* 



c^E 

q,l 



E E 

q' ,1' (i, 3) e -D U -D* 
(2*, J!*) = (,',!') 
(^i.^,-) = (5,0 



X 



> t/(2z;„) 



<Q^maxP* 

q.l 



<Q exp 



E E 

q' ,1' (i, j) e -D U D* 
= (,',!') 

2 

[2z;„(Cg)2]'^ + l^* 



= exp 



>t/ [2v„{CQ)' 



1 
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Then, 



p* 



q.l 



q' ,1' e D u r>* 

(^i, z,-) = {q, 1) 



> i/(2wr0 



<Q2 ^ {2nrf exp 

fc=[7/2 nr] 



1 



For the second term, Lemma E.l provides 



qU 

<g^maxP* 



q'.V 



<Q2p* [4„,^>t/ [2w„(CQ)2]] . 



> t/{2Vn) 

>t/ [2Vr.{CQf 



Upper bounding T3 

Let us first notice 



log 



1 



=(l-X,,,)log 
Then, 

E 

{i,3)eDVjD' 



J- log 



^ ■> ^ ' 3 ^ ' .7 



+ Xij log 
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Centering the Xijs, it eomes 



(i,j)€DuD' 



:J2 E K,-^...)l0, 



q,l (i, 3) e -D U D* 

(^*, ^*) = (9. 



1 - 



E E 



q,l (i, j) e n u -D* 
= (9,0 



7 - TT, 



q.l> 



T^g,; log 



which leads to 

+ E^;.' 

q.l 



{T^q,l - TT*,) 



- log 



1 - 



(l-<i)log 



log 



E (^^j- - 

{i, i) € D U D* 
= (g.l) 



where iV*; = E(.,j)65ud- 1(^*=9, ^*=0- 

Second on the event Qn, (25) and |log(l + a;)| < 2 |a;| for every x S [—1/2, 1/2] 
entail 



\T3\<4v„Y. 



( = "'■) = (1. 1) 



q,l 



q,l 



Then for every t > 0, 



p*[n„n{\n\>t}]<p* 



4-«E 



9,Z 



E i^i,j~'^q,l) 



{i.j) e D U D* 

i^t'^p = (9. ') 



> t/2 



+ P* 4i;„^iV;, >t/2 

q,l 

Similarly to T2, partitioning and Hoeffding's inequality lead to 



p* 



4«»E 



q,l 
2nr 



(Z* J!*) = (<J, 1) 



> t/2 



<Q^ {2nrf exp 

k— [7/2 nr] 



i;2(fc+|i5*|) 
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P* 



q,l 



< P* [v,Anr >t/8] 



Then, 



P*[n„n{\n\>t}]<Q^ ^ (2nr)'=cxp 

k=[j/2nr] 

+ P* [vn^nr > t/8] 



vl{k+\D*\) 



Gathering Ti-, T2-, and Ts-upper bounds 

At the beginning the fohowing steps are very close to those in the proof of 
Theorem B.4. 
For every e > 



< P* 



f-f . , P^H([z* 1) 



E 



> e > n n.„ 



P*\n':-, 



Furthermore, 



< 



< 



E E 



E 



^E 



.1 ? 



E 

.1 2 K^]^ 



loe 



lof 



> -(r + l)logn-rlogQ + loge^ n 17„ 



> — 5rlogn > niln 



{n > max {Q, e ^}) 



^ P*[{Ti+T2-T3>-5rlogn}nr!„] 
.1 « 
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It remains to deal with P* [ {Ti + r2 - T3 > -5r log n} n 17„ ]: 

P* [{T1 + T2-T3 > -5rlog7i}na.] 

< P* [{Ti + T2 - Tg > -Srlogn} n n,, n {jTal < rlogn}] 

+ P*[{|T3| >rlog7i}nr!„] 

< P* [ {Ti + > -6r log n r!„ ] + P* [ { IT3 1 > r log n} n n,, 

< P*[Ti > -7rlogn]+P*[{|r2| > r log ri} n r!„ ] 
+ P*[{|T3| >rlogn}nr!„] . 



Upper bounding Ti comes from Proposition B.4 and results in 



P* [Pi > -7rlogn] < exp 
< exp 

For Lemma E.l provides 



r log n 
r log n 

2nr 



UK* 
UK* 



■ exp 



• exp 



-\D 



[K* 



l{K*? 
2Cc_ 



P*[{|r2|>rlogn}nr!„]<Q2 ^ (2nr)'= exp 

fc=[7/2nr] 



1 (rlogn)^ 
2(Cg)4 Anrvl 

+ Q^P* [4?ir > rlogn/ [2w„(CQ)2] ] 

< exp [ 8nr log n ] ■ exp 

log n 



1 (rlogn)^ 
'2(CQ)4 4nr?;2 



Similarly for T3, it results 



Vn > 



2nr 



8n(CQ)2 



P*[{\n\>rlogn}nnn]<Q^ J2 (2nr)'^exp 

fe=r7/2 nr] 



2 (rlogn)^ 



+ P* [ w„4nr > (r log n) /8 ] 



< exp [ 8nr log n ] ■ exp 
log n 



2 (rlogn)^ 
^ 47iru,^ 



32n 



From the previous bounds, one observes that requiring w„ = o{\ogn/n) makes 
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Vn > 



log 71 

32n 



and P* 



vanish as n grows, which leads to 



P* [ {Ti +T2-Ts>-5r log n} D n„ ] 

2 r(log n)^ "I 



< exp [ 8nr log n ] ■ cxp 
+ cxp [ Srir log n ] ■ cxp 



' cxp 



r log 77,- 



UK* 



Cr 



■ cxp 



82Q4 4„^;2 

1 r(logn) 
2(Cg)4 4ni;2 



2 1 



2Cr 



< Ci exp 



; log n — C- 



(log ?7.)^ 



for large enough values of n and constants Ci, C2 > only depending on Q, C, 
7, and K* but not of z* . 

Following the same line as in the proof of Theorem 3.1, for every e > and 
large enough values of tt,, it comes 



^ P-^H([z,„]]) 



> e > n 17„ 



< 



r=l 



1 exp 



nvt. 



Ci([l + (Q-1K]"-1) , 



where u'^ = cxp 
comes 



Bn log n — C- 



(log")" 



Thus requiring ij„ ~ o {\/\og n/nj, it 



[l + iQ- 1)< ]" = cxp [ n log (1 + (g - 1)<) ] 



< exp[(Q - 



1 



which concludes the proof since no upper bound does depend on z*. 



E.2. Lemma E.l 

Lemma E.l. Let tt G A4q(M.) denote a matrix with coefficients nq^i belong to 
[0, 1], and zi^n^and be two label vectors such that '}2^=i '^zi^z'^ — "T- Then, 



< 2nr 
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Proof of Lemma E.l. Without loss of generality, one can assume the first r co- 
ordinates of are different from those of . Then, any difference between 
T^zi,z- and TT , , can only occur if 7^ (z*,z*). It results 

< nr + (n — r)r 

< 2nr . 

□ 



Lemma E.2. With the same notation as Proposition 3.8 and the assumptions 
of Theorem 3.9, the maximum likelihood estimator of a is given by 

I " 



Proof of Lemma E.2. Let us introduce some notation: 

• (^[n] , 7^) ^ /X[„] {Z[n] , Tt) = Ci (X[„] ; Z[„] , Tt) , 

• (a, tt) /x[„, (a, tt) = £3 (-^[„] ; tt) , 

• (a,7r) 1^- /x[„,,Z[„j(a, tt) = /x[„, (^'jn], Tr)/^^^^ (a) denote the complete like- 
lihood of (a, tt). 

We start computing the derivative of /x[„] (<a, tt) + A(^^ aq — 1) with respect 
to ftfc, for 1 < fc < Q and A e M. 



d 



where -/Vfc(z[„]) — l(2i=fc)- Multiplying by and summing over fc leads to 

A = -n/x[„, (S, tt) , 

where S denotes the optimum location of a (for which the derivative vanishes). 
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It results for every k 



= E 



n /x[„,(a,7r) 



where (a, tt) = P^^ [^[n] = ^[n] I ^[n] ] (Sectfon 3.1.2) denotes the a pos- 
teriori probabihty of Z[„] = z^^] given X[„] with parameters (5;,7r). 
Finally, the result comes from 



1 " 

"fc = - I] H l(..=fc)/J"' (S, tt) 

1 " 

1 " 



, , , , (z.=fc)Ps,7r [^[ri] — Z[n] \ ^[n] _ 
1=1 Zr, 



1=1 



Replacing tt by the MLE tt of tt*, the MLE of a* satisfies for every k 

-. n 1 



i=l i=l 



□ 
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Appendix F: Proof of Theorem 4.2 

Lemma F.l. Let = %](7r) = Argmax^j_^j/:i(X[„] ; Z[„], tt). For every A[„] G 
Xn, (ckjTr) G 0, and T[„] G S'n, it comes that 

J {X[n] ; T[„] , a, tt) < £2 {X[n] ; a, tt) < £1 ( A[„] ; Z[„] , tt) . 



Proof of Lemma F.l. The first inequality results from the definition of J' given 
byEq. (10). 

The second one comes from Z[„](7r) = Argmax^^ j£i(A[„]; zj„], tt). Thus for 
every (a,7r), 



/:2(A[„];a,^) < log \ e^i(^H;?H.-) ^ PzH(^[n]) [ < Ci{X[,,y,?[„^,7r) . 

□ 



Lemma F.2. Lemma F.l and Assumption (A3) entail that there exists < 
7 < 1 such that for every (a, tt), 

|£2(A[„];Q;,7r) - £i(X[„];z[„],7r)| < nlog(l/7) , 

\j(X[n];T[n],a,TT) - Cl{X[nY,Z[n],TT)\ < n\og{l/'-f) . 



Proof of Lemma F.2. From Lemma F.l and definition of r[„] it comes for every 
(a,7r): 

J {X[n\ \z\n\,OL,'K) < J (X[„] ; ?[„] , «, tt) < £2 (^[n] ; «, tt) < £1 (A[„] ; Z[„] , tt) . 

Combined with J'(A[„]; Z[„],a,7r) = £i(A[„]; Z[„],7r) + X^Lilog"?. (see 
Lemma F.3), it leads to both 

n 

|'^^2(A[„];a,7r) - £i(A[„];z[„],7r)| < -^logaj, , 

1=1 

n 

|j'(^H;T'H,a,7r) - £i(A[„]; Z[„],7r)| < -^loga?, . 
Assumption (A3) yields the conclusion. □ 



imsart-ejs ver. 2010/04/27 file: SBM_Var_MLE_EJS.tex date: January 21, 2013 



Celisse, Daudin, Pierre/MLE and variational estimators in SBM 



54 



Lemma F.S. With the same notation as Theorem Jf..2, let Z[„] = £[„](7r) = 
Argmin^j^j/:i(X[„]; Z[„], tt). Then for every {a, tt), 

n 

J(X[n] ; Z[n] , a, tt) = Ci (X[„] ; Z[„] , tt) + ^ log aj-. . 

i=l 

Proof of Lemma F.3. First, let us recall Eq. (10) 

J{X^r^] ; r[„l ,a,TT)=C2 (X[„] ; «, ^) - , P^["l ) 

= log [/(X[„]; a, tt)] - if P^H) , 

where (a, tt) i-^ /(X[„];a,7r) denotes the likelihood of (a,7r). 
Second. Eq. (9) and simple calculations lead to 

= L - L ^n.i l°g [ fiX,^y^a,n) ) ' 

where {a,Tr) i— /(X[„], Z[„]; a, tt) denotes the complete-likelihood of (a,7r). 
Then, 

if(i?,;„j,P^H) ^^r,.,logr,,, 

where a i-^ f{z[„];a) = nr=i (E^=i "g') • Hence, 
A- , P-'^H) _ ^ r,,g log r,,, 

= - ^ ^ [ ^8 J log TTgJ + (1 - l0g(l - TTqj) ] Ti^qTjJ 

i^j q,l 

+ X] '^'-9 + (-^[„] ; a, tt) . 

i,q 

Therefore for every r[„] , 

= 51 log^Tg,; + (1 - ) l0g(l - nq,l)]n,qTjJ - ^ Tj,, (log T,^, - logOg) 
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With T[„] Z[„], it comes J'(X[„] ;?[„], a, tt) = £i(X[„];z[„],7r) + J27=i^'^&'^zi, 
which concludes the proof. □ 



Lemma F.4. 



Proof of Lemma F.4-- 



TV 



< 



< \I^K{Dr„,,P 



- log 

2 ^ 



P{2 



□ 
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